ESS 330 - Daily Exercise 21

Author

Andie Hall

Published

April 21, 2025

Daily Assignment 21

Library Code

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(tidymodels)
── Attaching packages ────────────────────────────────────── tidymodels 1.2.0 ──
✔ broom        1.0.7     ✔ rsample      1.2.1
✔ dials        1.3.0     ✔ tune         1.2.1
✔ infer        1.0.7     ✔ workflows    1.1.4
✔ modeldata    1.4.0     ✔ workflowsets 1.1.0
✔ parsnip      1.2.1     ✔ yardstick    1.3.2
✔ recipes      1.1.1     
── Conflicts ───────────────────────────────────────── tidymodels_conflicts() ──
✖ scales::discard() masks purrr::discard()
✖ dplyr::filter()   masks stats::filter()
✖ recipes::fixed()  masks stringr::fixed()
✖ dplyr::lag()      masks stats::lag()
✖ yardstick::spec() masks readr::spec()
✖ recipes::step()   masks stats::step()
• Dig deeper into tidy modeling with R at https://www.tmwr.org
library(tsibble)
Registered S3 method overwritten by 'tsibble':
  method               from 
  as_tibble.grouped_df dplyr

Attaching package: 'tsibble'

The following object is masked from 'package:lubridate':

    interval

The following objects are masked from 'package:base':

    intersect, setdiff, union
library(feasts)
Loading required package: fabletools

Attaching package: 'fabletools'

The following object is masked from 'package:yardstick':

    accuracy

The following object is masked from 'package:parsnip':

    null_model

The following objects are masked from 'package:infer':

    generate, hypothesize
library(dataRetrieval)
library(plotly)

Attaching package: 'plotly'

The following object is masked from 'package:ggplot2':

    last_plot

The following object is masked from 'package:stats':

    filter

The following object is masked from 'package:graphics':

    layout

Data Import

# Example: Cache la Poudre River at Mouth (USGS site 06752260)
poudre_flow <- readNWISdv(siteNumber = "06752260",    # Download data from USGS for site 06752260
                          parameterCd = "00060",      # Parameter code 00060 = discharge in cfs)
                          startDate = "2013-01-01",   # Set the start date
                          endDate = "2023-12-31") |>  # Set the end date
  renameNWISColumns() |>                              # Rename columns to standard names (e.g., "Flow", "Date")
  mutate(Date = yearmonth(Date)) |>                   # Convert daily Date values into a year-month format (e.g., "2023 Jan")
  group_by(Date) |>                                   # Group the data by the new monthly Date
  summarise(Flow = mean(Flow))                       # Calculate the average daily flow for each month
GET:https://waterservices.usgs.gov/nwis/dv/?site=06752260&format=waterml%2C1.1&ParameterCd=00060&StatCd=00003&startDT=2013-01-01&endDT=2023-12-31

Converting Data Frame and Extracting Components

# Converting to Tibble Frame
tibble_poudre <- as_tibble(poudre_flow) |> 
  as_tsibble(index = Date)

# Creating Model
poudre_decomp <- tibble_poudre |>
  model(STL(Flow ~ season(window = "periodic")))

# Extracting Components
poudre_components <- components(poudre_decomp)

Plotting the Time Series Analysis

# Plotting Data Series 
poudre_plot <- poudre_components |> 
  autoplot() +
  labs(title = "STL Decomposition of Poudre Flow",
       y = "Flow (cfs)", x = "Date") +
  theme_minimal()

poudre_plot

# Animating the Plot
poudre_plotly <- ggplotly(poudre_plot)

poudre_plotly

Visualizing Seasonal Patterns

poudre_subseries <- tibble_poudre |> 
  gg_subseries(Flow) +
  labs(title = "Seasonal Subseries Plot of Poudre River Flow",
       y = "Average Flow (cfs)", x = "Month") +
  theme_minimal()

poudre_subseries

Data Analysis

With each of the time series plots, we can clearly see that there is a spike in the flow of the gauge around May and June. This makes sense as this is when the heaviest amount of rain is typically found in Northern Colorado. There is also the fact that over the years, there is an overall decrease in the amount of flow, meaning that there is less precipitation to help initiate flow. The sub-series allows for the data to be sorted by the months rather than a linear, and shows if there is any outliers that may screw the analysis of the seasons. We can see this specifically in September and slightly in April, but with all of the data sorted by month, we can see the average flow for each month and easily find the season with the highest amount of flow.

Daily Assignment 22

Library Code

library(modeltime)
library(tidymodels)
library(timetk)
library(yardstick)
library(dplyr)
library(lubridate)

Splitting Data for Models

tibble_poudre_fixed <- tibble_poudre %>%
    mutate(Date = lubridate::ym(Date)) %>%
    arrange(Date) %>%
    ungroup()

tibble_poudre_fixed <- tibble_poudre_fixed %>% as_tibble()

splits <- time_series_split(tibble_poudre_fixed, assess = 12, cumulative = TRUE)
Using date_var: Date
training <-  training(splits)
testing  <-  testing(splits)

Models

model_prophet <- prophet_reg(
    seasonality_yearly = TRUE
) %>%
    set_engine("prophet")

model_arima <- arima_reg() %>%
    set_engine("auto_arima")

model_fit_prophet <- model_prophet %>%
    fit(Flow ~ Date, data = training)
Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
model_fit_arima <- model_arima %>%
    fit(Flow ~ Date, data = training)
frequency = 12 observations per 1 year
models_table <- modeltime_table(
    model_fit_prophet,
    model_fit_arima
)

calibration_tbl <- models_table %>%
    modeltime_calibrate(new_data = testing)

future_forecast_tbl <- models_table %>%
    modeltime_refit(data = tibble_poudre_fixed) %>%   
    modeltime_forecast(h = "12 months", actual_data = tibble_poudre_fixed)
Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
frequency = 12 observations per 1 year
future_forecast_tbl %>%
    plot_modeltime_forecast(.interactive = FALSE)
Warning: ✖ Expecting the following names to be in the data frame: .conf_hi, .conf_lo.
ℹ Proceeding with '.conf_interval_show = FALSE' to visualize the forecast without confidence intervals.
Alternatively, try using `modeltime_calibrate()` before forecasting to add confidence intervals.

Importing the Actual Streamflows

poudre_flow_2024 <- readNWISdv(siteNumber = "06752260",    # Download data from USGS for site 06752260
                          parameterCd = "00060",      # Parameter code 00060 = discharge in cfs)
                          startDate = "2024-01-01",   # Set the start date
                          endDate = "2024-12-31") |>  # Set the end date
  renameNWISColumns() |>                              # Rename columns to standard names (e.g., "Flow", "Date")
  mutate(Date = yearmonth(Date)) |>                   # Convert daily Date values into a year-month format (e.g., "2023 Jan")
  group_by(Date) |>                                   # Group the data by the new monthly Date
  summarise(Flow = mean(Flow))                       # Calculate the average daily flow for each month
GET:https://waterservices.usgs.gov/nwis/dv/?site=06752260&format=waterml%2C1.1&ParameterCd=00060&StatCd=00003&startDT=2024-01-01&endDT=2024-12-31
combined_poudre <- bind_rows(poudre_flow, poudre_flow_2024)

combined_tibble_poudre <- as_tibble(combined_poudre) |> 
  as_tsibble(index = Date)

Comparing the Models

# Filter only 2024 predictions
predictions_2024 <- future_forecast_tbl %>%
    filter(.index >= yearmonth("2024 Jan")) %>%
    select(.index, .value, .model_desc)
Warning: There was 1 warning in `filter()`.
ℹ In argument: `.index >= yearmonth("2024 Jan")`.
Caused by warning:
! Incompatible methods (">=.Date", ">=.vctrs_vctr") for ">="
actual_2024 <- poudre_flow_2024 %>%
    rename(.index = Date,
           actual_flow = Flow)

comparison_tbl <- left_join(predictions_2024, actual_2024, by = ".index")

comparison_tbl %>%
    group_by(.model_desc) %>%
    summarise(
        RMSE = rmse_vec(truth = actual_flow, estimate = .value),
        MAE  = mae_vec(truth = actual_flow, estimate = .value),
        MAPE = mape_vec(truth = actual_flow, estimate = .value)
    )
# A tibble: 2 × 4
  .model_desc                      RMSE   MAE  MAPE
  <chr>                           <dbl> <dbl> <dbl>
1 PROPHET                          226.  164.  293.
2 UPDATE: ARIMA(0,0,2)(0,1,1)[12]  208.  110.  117.

Plotting the Models

library(ggplot2)

comparison_tbl %>%
    ggplot(aes(x = .index)) +
    geom_line(aes(y = actual_flow), color = "black", size = 1, linetype = "dashed") +
    geom_line(aes(y = .value, color = .model_desc), size = 1) +
    labs(title = "Actual vs Predicted Streamflows (2024)",
         y = "Flow (cfs)",
         x = "Month",
         color = "Model") +
    theme_minimal()
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.

Fitting the Linear Model

# For one model at a time (example: Prophet model)
prophet_comparison <- comparison_tbl %>%
    filter(.model_desc == "PROPHET")

# Fit linear model: actual vs predicted
lm_prophet <- lm(actual_flow ~ .value, data = prophet_comparison)

summary(lm_prophet)$r.squared
[1] 0.85037
arima_comparison <- comparison_tbl %>%
    filter(.model_desc == "UPDATE: ARIMA(0,0,2)(0,1,1)[12]")

lm_arima <- lm(actual_flow ~ .value, data = arima_comparison)

glance(lm_arima) %>%
    select(r.squared)
# A tibble: 1 × 1
  r.squared
      <dbl>
1     0.919

Plotting

# Compute R² for both models
lm_prophet <- lm(actual_flow ~ .value, data = comparison_tbl %>% filter(.model_desc == "PROPHET"))
lm_arima <- lm(actual_flow ~ .value, data = comparison_tbl %>% filter(.model_desc == "UPDATE: ARIMA(0,0,2)(0,1,1)[12]"))

r_squared_prophet <- summary(lm_prophet)$r.squared
r_squared_arima <- summary(lm_arima)$r.squared

# Plot for both models
comparison_tbl %>%
    ggplot(aes(x = .value, y = actual_flow, color = .model_desc)) +
    geom_point(size = 3) +
    geom_abline(intercept = 0, slope = 1, color = "black", linetype = "dashed", size = 1) +
    geom_smooth(method = "lm", se = FALSE) +
    annotate("text", x = max(comparison_tbl$.value) * 0.7, 
             y = max(comparison_tbl$actual_flow) * 0.9, 
             label = paste("R² (Prophet) = ", round(r_squared_prophet, 3)), color = "blue", size = 5) +
    annotate("text", x = max(comparison_tbl$.value) * 0.7, 
             y = max(comparison_tbl$actual_flow) * 0.85, 
             label = paste("R² (ARIMA) = ", round(r_squared_arima, 3)), color = "red", size = 5) +
    labs(
        title = "Predicted vs Observed Streamflow (All Models)",
        x = "Predicted Flow (cfs)",
        y = "Observed Flow (cfs)",
        color = "Model"
    ) +
    theme_minimal()
`geom_smooth()` using formula = 'y ~ x'

Model Analysis

While both models have high R2s, when compared in a linear model, they look to be far from the observed flows. Arima has a higher R2 which details a more accurate model. The peak of the observations were far off from the observed flow, which matches with the comparison of the models to the observed flow from 2024. However, it was able to predict the periods where the flow would spike. Overall, Arima is the best model for this prediction and created a fairly accurate read.